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Abstract 

Existing economic models support the estimation of the 
costs and benefits of developing and evolving a Software 
Product Line (SPL) as compared to undertaking traditional 
software development approaches. In addition, Feature Dia- 
grams (FDs) are a valuable tool to scope the domain of a 
SPL. This paper proposes an algorithm to calculate, from 
a FD, the following information for economic models: the 
total number of products of a SPL, the SPL homogeneity and 
the commonality of the SPL requirements. The algorithm 
running time belongs to the complexity class 0(/ 4 2 c ). In 
contrast to related work, the algorithm is free of dependen- 
cies on off-the-self tools and is generally specified for an 
abstract FD notation, that works as a pivot language for 
most of the available notations for feature modeling. 

1. Introduction 

Software Product Line (SPL) practice is a widely used 
approach for the efficient development of whole portfolios 
of software products [ 16 1. However, the SPL approach is not 
always the best economic choice for developing a family of 
related systems. The domain of a SPL must be carefully 
scoped, identifying the common and variable requirements 
of its products and the interdependencies between require- 
ments. In a bad scoped domain, relevant requirements may 
not be implemented, and some implemented requirements 
may never be used, causing unnecessary complexity and 
both development and maintenance costs [6|. To avoid 
these serious problems, SPL domains are usually modeled 
by mean of Feature Diagrams (FDs). Moreover, decision 
makers must be able to predict the costs and benefits of 
developing and evolving a SPL as compared to undertaking 
traditional development approaches. Thus, domain models 
are used in conjunction with existing economic models, 
such as the Structured Intuitive Model for Product Line 
Economics (SIMPLE)|5| and the Constructive Product Line 
Investment Model (COPLIMO)[4|, to estimate SPL costs 
and benefits. 

A fundamental input parameter for economic models, that 
can be inferred from domain models, is the total number 
of products of a SPL. For instance, SIMPLE estimates 
the cost of building a SPL using equation [T] where: C OIg 



expresses how much it costs for an organization to adopt 
the SPL approach, C ca b is the cost of developing the SPL 
core asset bas^, n is the number of products of the SPL, 
C U nique(producti) is the cost of developing the unique parts 
of a product, and C roU so(producti) is the development cost 
of reusing core assets to build a product. 

C*SPL = C org + C ca b + 
n 

y^(C , un iq UC (product i ) + C rouso (producti)) (1) 

i=i 

Another interesting SIMPLE metric is homogeneity, that 
provides an indication of the degree to which a SPL is 
homogeneous (i.e., how similar are the SPL products). 
Homogeneity is calculated by equation |2 where: n is the 
number of products of the SPL, \\Rjj\\ is the number of 
requirements unique to one product, and \\Rt\\ is the total 
number of different requirements. 



HomogeneitygpL = 1 



\\Ru\\ 
\\Rt\\ 



(2) 



Hence, unique SPL requirements Rtj must be identified in 
order to calculate homogeneity. Nevertheless, not only it 
is interesting the distinction between common and unique 
requirements, but also the relative importance of any require- 
ment to the SPL, i.e., its commonality |2|. Commonality of 
a requirement Rj is calculated by equation|3] where: \\Pr - \\ 
is the number of products that implement the requirement 
and n is the total number of products of the SPL. 



CommonalityRj = 



\Pr, 



(3) 



This paper proposes a time-efficient algorithm to calculate, 
from a FD that scopes the domain of a SPL, the total 
number of products of the SPL, the SPL homogeneity and 
the commonality of the SPL requirements. In order to make 
our proposal as general as possible, the algorithm is specified 
for an abstract notation for FDs, named Neutral Feature 
Tree (NFT), that works as a pivot language for most of the 
available notations for feature modeling. 

1. According to the SPL approach, products are built from a core asset 
base, a collection of artifacts that have been designed specifically for reuse 
across the SPL. 



The remainder of this paper is structured as follows. 
Section |2] formally defines the abstract syntax and semantics 
of NFT. Sections |3]and|4]present the sketch of the algorithm. 
Section compares our work to related research on the 
automated analysis of FDs. Finally, section [6] summarizes 
the paper and outlines directions for future work. 

2. An abstract notation for modeling SPL vari- 
ability 

Since the first FD notation was proposed by the FODA 
methodology in 1990 lfl3ll . a number of extensions and 
alternative languages have been devised to model variability 
in families of related systems: 

1) As part of the following methods: FORM 04), 
FeatureRSEB [10], Generative Programming [6|, Soft- 
ware Product Line Engineering ifTTl , PLUSS Q. 

2) In the work of the following authors: M. Riebisch et 
al. fl9l , J. van Gurp et al. Il24l . A. van Deursen et al. 
Il23l, H. Gomaa H. 

3) As part of the following tools: Gears Q and 
pure::variants [18|. 

Unfortunately, this profusion of languages hinders the ef- 
ficient communication among specialists and the portability 
of FDs between tools. In order to face this problem, P. 
Schobbens et al. EDI . fl2l . Ifl5l propose the Varied Feature 
Diagram + (VFD + ) as a pivot notation for FDs. VFD+ 
is expressively complete and most FD notations can be 
easily and efficiently translated into it. VFD+ diagrams are 
single-rooted Directed Acyclic Graphs (DAGs). However, 
our algorithm takes advantage of FDs structured as trees. For 
that reason, we propose the usage of a VFD+ subset, named 
Neutral Feature Tree (NFT), where diagrams are restricted 
to be trees. 

In this section we formally define NFT. Concretely: 
section 12.11 outlines the main parts of a formal language; 
sections l2~2l and |2~31 define the abstract syntax and semantics 
of NFT, respectively; and section 12.41 demonstrates the 
equivalence between NFT and VFD+. We emphasize NFT 
is not meant as a user language, but only as a formal "back- 
end" language used to define our algorithm in a general way. 

2.1. Anatomy of a formal language 

According to J. Greenfield et al. [9|, the anatomy of a 
formal language includes an abstract syntax, a semantics 
and one or more concrete syntaxes. 

1) The abstract syntax of a language characterizes, in a 
abstract form, the kinds of elements that make up the 
language, and the rules for how those elements may be 
combined. All valid element combinations supported 
by an abstract syntax conform the syntactic domain C 
of a language. 



2) The semantics of a language define its meaning. Ac- 
cording to D. Harel et al. IfTTl . a semantic definition 
consists of two parts: a semantic domain S and a 
semantic mapping M. from the syntactic domain to 
the semantic domain. That is, M. : C — > S. 

3) A concrete syntax defines how the language elements 
appear in a concrete, human-usable form. 

Following sections define NFT abstract syntax and se- 
mantics. Most FD notations may be considered as concrete 
syntaxes or "views" of NFT. 

2.2. Abstract syntax of NFT 

A NFT dia gram d £ £nft ^ s a tuple {N, S, r, DE, A, (f), 
where: 

1) N is the set of nodes of d, among r is the root. Nodes 
are meant to represent features. The idea of feature is 
of widespread usage in domain engineering and it has 
been defined as a "distinguishable characteristic of a 
concept (e.g., system, component and so on) that is 
relevant to some stakeholder of the concept" (6). 

2) E C N is the set of terminal nodes (i.e., the leaves of 
d). 

3) DE C N x N is the set of decomposition edges; 
(71,1,71,2) 6 DE is alternatively denoted n\ — » n^. If 
m — > ri2, ri\ is the parent of ri2, and is a child of 
n%. 

4) A : (N — S) — > card labels each non-leaf node 
n with a card boolean operator. If n has children 
ni,...,n s , card s [i-.j] (fii, n s ) evaluates to true if 
at least i and at most j of the s children of n evaluate 
to true. Regarding the card operator, the following 
points should be taken into account 

a) whereas many FD notations distinguish between 
mandatory, optional, or and xor dependencies, 
card operator generalizes these categories. For 
instance, figure Q] depicts equivalences between 
the feature notation proposed by K. Czarnecki et 
al. |6) and NFT. 

b) whereas, in many FD notations, children nodes 
may have different types of dependencies on 
their parent, in NFT all children must have the 
same type of dependency. This apparent limi- 
tation can be easily overcome by introducing 
auxiliary nodes. For instance, figure [2] depicts 
the equivalence between a feature model and a 
NFT diagram. Node A has three children and two 
types of dependencies: A — » B is mandatory and 
(A -> C, A -> D) is a xor-group. In the NFT 
diagram, the different types of dependencies are 
modeled by introducing the auxiliary node aux. 

2. The same considerations are valid for VFD+. 



5) are additional textual constraints written in prepo- 
sitional logic over any type of node (<fi G M(N)). 

Additionally, d must satisfy the following constraints: 

1) Only r has no parent: Vn G N ■ (3n' G N ■ n' — > n) 
n ^ ?*. 

2) d is a tree. Therefore, 

a) a node may have at most one parent: 

Vn G N ■ (3n',n" G N ■ ((n' -> n) A (n" -> 
n) =>- n' = n")) 

b) DE is acyclic: ^ni, ri2 . . . , n^ <E N • n\ — > ri2 — > 
. . . — > nfe — > m. 

3) card operators are of adequate arities: 

Vn 6 N-(3n' G iV-n -> n') (A(n) = card s )A(s = 
\\{(n,n')\(n,n')GDE}\\) 



mandatory 




optional 




X{\ n )=tard,[9..s] 







[1..S] 



Figure 1 . card operator generalizes mandatory, op- 
tional, or and xor dependencies 



1) A configuration is a set of features, that is, any element 
of V(N). A configuration c is valid for a d 6 £nft, 
iff: 

a) The root is in c (7- G c). 

b) The boolean value associated to the root is t rue. 
Given a configuration, any node of a diagram 
has associated a boolean value according to the 
following rules: 

i) A terminal node t £ £ evaluates to true if 
it is included in the configuration (t G c), else 
evaluates to false. 

ii) A non-terminal node n G (N — S) is la- 
beled with a card operator. If n has children 
ni,...,n s , card s [z..j](ni, n s ) evaluates to 
true if at least i and at most j of the s 
children of n evaluate to true. 

c) The configuration must satisfy all textual cons- 
traints (p. 

d) If a non-root node is in the configuration, its 
parent must be too. 

2) A product p, named by a valid configuration c, is the 
set of terminal features of c: p = c n S. 

3) The SPL represented by d G £nft consists of the 
products named by its valid configurations (SPL 6 
V{V{^))). 

2.4. Equivalence between NFT and VFD+ 




Figure 2. Different types of dependencies between 
a node and its children can be expressed in NFT by 
introducing auxiliary nodes 



2.3. Semantics of NFT 

Feature diagrams are meant to represent sets of products, 
and each product is seen as a combination of terminal 
features. Hence, the semantic domain of NFT is P(7 :> (E)), 
i.e., a set of sets of terminal nodes. The semantic mapping 
of NFT (Xnft : £nft -> assigns a SPL to 

every feature diagram d, according to the next definitions: 

3. also named cross-tree constraints |2|. 



NFT differentiates from VFD + in the following points: 

1) Terminal nodes vs. primitive nodes. As noted by 
some authors (TJ, there is currently no agreement on 
the following question: are all features equally relevant 
to define the set of possible products that a feature 
diagram stands for? In VFD+, P. Schobbens et al. have 
adopted a neutral formalization: the modeler is respon- 
sible for specifying which nodes represent features that 
will influence the final product (the primitive nodes 
P) and which nodes are just used for decomposition 
(N — P). P. Schobbens points that primitive nodes 
are not necessarily equivalent to leaves, though it is 
the most common case. However, a primitive node 
p G P, labeled with card s [i..j](ni, ...,n s ), can always 
become a leaf (p G E) according to the following 
transformation 7p_»s: 

a) p is substituted by an auxiliary node auxi. 

b) the children of auxi are p and a new auxiliary 
node aux2. 

c) auxi is labeled with card2[2..2](p, aux2). 

d) p becomes a leaf. aux2's children are the former 
children of p. 

e) aux2 is labeled with the former 
card s [i..j](nx, ...,n s ) of p. 



Figure |3]depicts the conversion of a primitive non-leaf 
node B into a leaf node. 
2) DAGs vs. trees. Whereas diagrams are trees in NFT, in 
VFD+ are DAGs. Therefore, a node n with s parents 
(ni, ...,n s ) can be translated into a node n with one 
parent n\ according to the following transformation 

— >trcc- 

a) s — 1 auxiliary nodes aux2, aux s are added to 
the diagram. 

b) edges n 2 — > n, ...,n s — > n are replaced by new 
edges n 2 — > aux 2 , ...,n s — > aux s . 

c) D. Batory 1 1 1 demonstrated how to translate any 
edge a — > & into a propositional logic formula 
4> a ,b- Using Batory's equivalences, implicit edges 
aux2 — > n, ...,aux s — > n are converted into tex- 
tual constraints 4> a ,ux 2 ,n--- ( t > a,ux e ,n an d are added 

tO <j) (<j)' = <j) A </>aux 2 ,n A ... A 0aux a ,n)- 

Figure [4] depicts the conversion of a node D with two 
parents B and C into a node with a single parent. 




Figure 3. Any primitive non-leaf node can be converted 
into a leaf node by using 7>^ E 



is expressed as a set of rules of the form d — > d', where d 
is a diagram containing a defined node or edge n, and all 
possible connections with this node or edge. Its translation d! 
is a subgraph in £ , plus how the existing relations should be 
connected to nodes of this new subgraph". According to this 
definition, Tp^s and 7bAG->tree ^ graphical embeddings 
that guarantee the equivalency between NFT and VFD+. 

3. Calculating the products in a NFT diagram 
without textual constraints 

This section presents how to calculate the total number 
of products of a SPL modeled by a NFT diagram without 
considering textual constraints. 

The number of products of a node n is denoted as 
P(n). Thus, the total number of products represented by 
a NFT diagram is P(r), where r is the root. For a leaf 
node I, P(l) = 1. Table [3] includes equations to calculate 
P(n) for a non-leaf node n that has s children rij of type 
mandatory (i.e., n is labeled with card s [s..s]), optional 
(card s [0..s]), xor (card s [l..l]) and or (card s [0..s]). Hence, 
time-complexity for calculating P(n) in these cases is O(s). 
Therefore, time-complexity for computing P(r) is linear on 
the diagram number of nodes, i.e., O(N). In general, when a 



type of relationship 


formula 


mandatory (cards [s...s]) 


P(n) = 


lit 




optional (card s [0..s]) 


P(n) = 


ni= 




or (cards 


P(n) = 


Oil 


=1 (PK) + i))-i 


xor (cards [1-1]) 


P(n) = 







Table 1 . Number of products for mandatory, optional, 
or and xor relationships 




Figure 4. Any DAG can be converted into a tree by 
using Tdag — >tree 

In order to identify when a transformation on a diagram 
keeps (1) the diagram semantics and (2) the diagram struc- 
ture, P. Schobbens [20] proposes the following definition 
of graphical embedding: "a translation T : C — > £' that 
preserves the semantics of C and is node-controlled, i.e., T 



node n has s children and is labeled with caxd s [low..high], 
P(n) is calculated by equation |4] where Sk is the number 
of products choosing any combination of k children from s. 
For the sake of clarity, let us denote P(ni), P(na), ■ • ■ P(n s ) 
as pi,P2, ■ ■ ■ ,Ps- In a straightforward approach, Sk can 
be calculated by summing the number of products of all 
possible k-combinations (see equation|5]l. Unfortunately, this 
calculation has exponential time-complexity. 

high 

P(n) = S k (4) 

fc=low 

S k = X] VixVii • ■ -Pi k ( 5 ) 

l<il <l2<l3---<ik <s 

A better complexity can be reached by using recurrent 
equations. The base case is So = 1. According to equa- 
tion \5\ Si = J2i=iPi- Calculating S2, the number of 
products for combinations of 2 siblings that include m is 

PlP2+PlP3--+PlPs = Pl(P2+P3 + --+Ps) = Pl(Sl-pi). 



Similarly, the number of products of 2-combinations that 
include n 2 is p 2 (Si — p 2 )- Adding up every 2-combinations, 
we get ^2i = iPi(Si — pi). However, in the sum each term 
PiPj is being accounted for twice; once in the round for i 
and another in the round for j. Thus, removing the redundant 
calculations: 



s 

i=l i=l 

= §(S?-f>?) 



A( A )=card 3 [1..3] 



A( B )=card 3 [1..3] M ) =card 3 [1..3 



A( C )=card 2 [1..2] 



i = (E => H) A (G =» H) A (J => I) 
Figure 5. A sample diagram 



Calculating S3, the number of products for combinations of 
3 siblings that include ri\ is p\ multiplied by the number 
of products for 2-combinations that do not contain n±, i.e., 
Pi (£2 —pi(Si — Pi))- Adding up every 3-combinations, we 
get: 

s s s 

^Pi(5 2 -pi(si - pi)) = s 2 si - Si y~^p? + y^Pt 

z = l i — 1 i — 1 

This time, every triple PiPjPk is being accounted for three 
times. Hence, removing the redundant computations: 

s 3 = ^ (s 2 Si-SiJ2p! + J2pU 
\ 1=1 1=1 / 

Our reasoning leads to the general equation [6] that has a 
much better time-complexity 0(ks). Combining equations 
|4] and [6] we conclude that the total number of products of 
a SPL represented by a NFT diagram can be calculated, 
without considering textual constraints, in quadratic time, 
i.e., 0(N 2 ); what constitutes a considerable improvement 
from exponential to polynomical computational complexity. 

So = 1 

k— 1 s 

Sk = \ ((-l^-i-i^pf) for 1 < k < s (6) 

i=o 3=1 

Let us consider the diagram in figure Ignoring the textual 
constraints in this example, it is easy to compute that nodes 
B and D generate 7 products each and C generates 3. Since A 
has or cardinality, we could use the corresponding equation 
in [3] and then, P(A) = (7+l)(3+l)(7+l)-l = 255. As way of 
example, we will compute the same amount using equation 
[6] We will begin computing the powers of the number of 
products from A's children and their sum: 



power 


B 


C 


D 


sum 


1 


7 


3 


7 


17 


2 


49 


9 


49 


107 


3 


343 


27 


343 


713 



Now, So = 1 by definition, Si = 17, as it is the sum of 
children's products, S 2 = 1/2(17 ■ 17 - 1 • 107) = 91, 
following the general formula [4] and S3 = 1/3(91-17 — 
17 • 107 + 1 • 713) = 147. Adding up Si, S 2 and S3, we get 
again 255. 



We will now tackle another question that may be skipped 
in a first reading but which will of interest in the next section. 
Suppose we have a node N, with n children whose number of 
products are respectively pi, p 2 , . . . , p n - Suppose we have 
computed already P(N) using equation [6] This calculation 
would provide us with vector S. What would happen if 
we should add a new child with p n +i products? We may 
compute a new vector S' using the general equation, but it 
is possible to derive S^ from Si directly, for any suitable i. 
Obviously, S^ will contain all the possibilities in Si, since 
all of them are valid combinations of i children of N. These 
are the combinations in S^ which do not include the new 
node. The combinations including the new child amount to 

Pn+l ■ Si-1. So, Sj = Si + Pn+l ■ Si-1. 

What we really want to do is exactly the opposite, that 
is, having computed S', eliminate a child and compute the 
vector S. 

50 = 1 

51 — Si — Pn + l ■ Si-1 (7) 

We already now S' and So = 1 by definition. Now we 
can iteratively compute Si, S2 and so on. ..using equation 
|7] Going back to our previous example, say we want to 
eliminate node C. Now So — 1 by definition, Si = 17 — 3 • 
1 = 14, S 2 = 91 - 3 - 14 = 49 and S 3 = 147 - 3 ■ 49 = 
(as expected, since there are only two siblings left). 

4. Computing the number of products, com- 
monality and homogeneity with textual cons- 
traints 

Usual SPL conceptualizations allow two types of cons- 
traints: require and exclude. We shall not restrain the cons- 
traints to anything other than standard propositional logic 
formulae. 

If the constraint C is in normal conjunctive form, that is, 
C = Ci A C2 A . . . A C m , such that Cj is a disjunction of 
literals, let Dj = ->Cj. Then, Dj is a conjunction of literals. 



Let P(n, C) be the number of products in a SPL with root 
n, satisfying the constraint C. This function possesses two 
interesting properties which we will use to our advantage: 

P(n,C) = P(n, true) - P(n,->C) 

P(n,D 1 VD 2 ) = P{n,D 1 ) + P{n,D 2 ) 
—P(n, Di A D 2 ) 

It is easy then to prove that, 

m 

P{n,\J Dl )= ]T (-l) l|Jf|l+1 P(n, f\ Dj) 

i=l KC{l..m}/\K^0 jeK 

Now, 

P(n, C) = P(n, true) - P(n, -~C) 

= P(n, true) - P{n,D 1 V D 2 V . . . V D m ) 

— P(n, true)— 

Sjf C{l..m}Aif/0 (" 1 )" K " + 1 -P( W , Ajeif A?) 

- E K c {1 .. ra} (-i) l|K|l ^KA,e X ^) 

If we define D K = Ajex % then 

P(n,C)= (-^ m P{n,D K ) 

Kjl.,m) 

This way, we have reduced our initial problem of computing 
the number of products in a SPL with an unrestricted 
constraint to several problems in which the constraint is a 
conjunction of literals. As an abuse of notation, we will 
often drop the D in P(n,D K ) and write simply P(n,K) 
wherever context is clear. 

Next, we will define a series of useful concepts. Infor- 
mally, we will say that a node, n, is selected under a 
particular constraint D K , and we will simply denote it by 
Sel(n, K), iff the particular restriction plus the structure of 
the tree and the associated cardinalities force the feature 
to be present in the products. Even if n does not occur in 
D K , n may be selected because some its child nodes are. 
Likewise, a node n will be deselected under a constraint 
D K , denoted by Desel(n, K) iff n does not belong to 
any product satisfying D K , be it because it is negated in 
D K , because the cardinality required for its child nodes is 
impossible to achieve or because one of its children is a 
contradicting node. 

The constraint D K being a conjunction of literals, we 
will represent it by two sets, namely Ak for the affirmated 
literals and Nk for the negated literals. We also define 
the nodes in the subtree of a node by the function F. If 
n is a node with s children (with s possibly being zero) 
rii, n 2 , ■ ■ ■ n s , then 

S 

F(n) = [JUn i }UF(n i )) 

It is computationally expensive to determine the number 
of products in a subtree of a SPL, given that we iterate 
over all subsets of K. Thus, we will restrict the textual 
constraints only to those that are relevant for the particular 
nodes. In order to do that, we shall define C(n) as the set 



of constraints which affect node n. Let M be {1, 2, ... , m}, 
then 

C(n) = ({n} U F(n)) n (Am U N m ) 

The calculations for any given node other than the root of 
the tree will not involve iterating over every subset of M, as 
there are 2' M of them, but only over every subset of C(n). 

Before we define formally Sel and Desel, we will also 
introduce some convenient, self-explaining abbreviations: 

Present (m, K) = Se\(m,K) A ->Desel(ni, K) 

Absent(ni, K) = -iSel(nj, K) A Desel (m, K) 

Contradicting(ni, K) = Se\(rii,K) A T>ese\(rii,K) 

Potential(nj, K) = ^Sel(rii,K) A -iDesel(nj, K) 

It is desirable for a node n and a constraint D K to be able to 
classify its child nodes according to these four possibilities. 
Absent nodes are not going to play a very interesting 
role, but the rest of them will. Let Ki be K n C(rii). 
This is the subset of cross-tree constraints in K which 
are relevant to child rij of n. The set of present nodes is 
PRE(n,K) = {n t : Present^, Ki)}, here 1 < i < s. 
We need to count how many nodes there are in each 
category, which we will call count-pre(n, K), count-pot(n, 
K) and count-con(n, K) respectively the number of present, 
potential and contradicting nodes. The present factor, which 
we will abbreviate by pre-fac = n„ePRE(n,K) K )- 
The potential factor is the cardinality of the potential subset 
as explained in the previous section, with low and high 
readjusted to account for the present nodes. Now, let us 
formally define Sel, Desel and P(n, K) for a node n with 
cardinality card [low., high]. For leaf nodes, we just consider 
low and high to be zero. 

s 

Sel(n, K) = ne A K \/\J Sel(n i7 Ki) 

and 

Desel(n, K) = n e N K V 

Vcount-pre(n, K) + count-pot(n, K) < low V 

Vcount-pre(n, K) > high V count-con(n, K ) > 

The amount P(n, K) will be the multiplication of the 
present factor and the potential factor provided there are 
no contradicting children. 

Another interesting economic metric for SPLs is the 
commonality of its features. To carry out this calculation for 
a given feature, the number of products in which the feature 
appears is needed. If n is a feature of a SPL and m G F(n), 
we define P(n, m, D K ) as the number of products of the 
SPL with root n that contain the feature to. This amount 
is really P(n, D K A to). Therefore, we could follow the 
indications in the former part of this section to carry out 
the calculation. However, it is convenient to visit each node 
in the tree just once for each K value in order to keep 
computational complexity manageable. 

We will first compute commonality for the children tii of a 
particular node, n, and then we will extend that computation 



to every other node in F(n). For a child n,; of n, which is 
not contradicting or absent under K, the number of products 
depends on its siblings, that is, P(n,rii, K) will be again 
the product of a present factor and a potential factor, only 
this time n$ will be considered as selected. If m is present, 
the node was already selected and nothing changes wrt. 
the computation of P(n, K), so P(n,rii,K) = P(n,K). 
If rii is potential, we will have to consider it as present. We 
will multiply pre-fac by P(rii, Ki) to get the new present 
factor and we will eliminate m from the potential factor 
as explained in the last part of the previous section, using 
equation [7] to get the new potential factor, which we will 
call new-pot-fac. Note that this promotion from potential 
to present can only be carried out if cardinality allows 
it, i.e. it is not possible if count-pre = high. In that case 
P(n, rii, K) = 0, just as if rii was an absent node. 

To compute P(n,rij,K) where rij G F(rii) we will 
proceed as in the computation of P(n,rii, K) except that 
the role of P(rii, K) will be played by P(m,nj, K). 

To recapitulate, if n is absent or contradicting, ob- 
viously P(n,rii,K) — 0, and so are P(n,rij,K) for 
every rij 6 F(m). If rii is present under K, then 
P(n,rii,K) = P(n,K) and P(n,n 3 ,K) = P(ni,nj,K) * 
P(n, K)/ P(rii, K). The most difficult case is when m is 
potential wrt. K. In that case, we compute P(n,rii,K) 
same as usual, only extracting rii from the list of potential 
nodes whose cardinality is to be computed (because it will 
act as a present node) and multiply said cardinality by 
the old pre-fac and by P(rii, Ki). For a rij £ F(rii), 
P(n, rij, K) — pre — fac*P(rii,rij, Ki)*new — pot — fac. 
Next subsection presents all these ideas in pseudocode^! 

4.1. Algorithm specification in pseudocode 

procedure spl(n : node) { 
// call the children recursively 
C(n) = 0; 

foreach child rii of n do { 

spl(ni); C(n) = C{n)uC(m)-} 
C(n) = C(n) U-{j|n appears in constraint #j} 
// iterate over all subsets of C(n) 
foreach K subset of C(n) do { 

Compute Ak and Nk\ 

count-pre = count-con = count-pot = 0; 

pre-fac = 1; 

Sel(n, K) = Desel(n, K) = false; 
pot-list = 0; 

foreach child rii of n do { 

Ki = C{m)n K; 
if Present(n;, Ki) { 

count-pre++; pre-fac = pre-fac*P(rii, Ki); } 

4. An executable prototype of the algorithm with source code is avail- 
able on http://www.issi.uned.es/ miembros/ pagpersonales/ rubenjxeradio/ 
rheradio_english. html 



else if Potential^, Ki) { 

count-pot++; pot-list.add(P(rii, Ki)); } 
else if Contradicting(ni, Ki) 

count-cont++; 
if Sel^, Ki) 
Sel(n, K) = true; 
} // foreach child 

if n G A k 

Sel(n, K) = true; 
if n G Nk 

Desel(n, K); 
else if count-pre > n.high 

Desel(n, K) = true; 
else if count-pre + count-pot < n.low 

Desel(n, bigK) = true; 
else if count-con > 

Desel(n, bigK) = true; 

// compute P(n, K) 

if Present(n, K) or Potential(n, K) { 

nlow = max(n.low - cont-pre, 0); 

nhigh = maa;(n.high - cont-pre, 0); 

<pot-fac, S> = cardinality(pot-list, nlow, nhigh); 

P(n,K) = pres-factor * pot-fac; 

} 

else P(n, K) = 0; 

Present(n, K) = Sel(n, K) Desel(n, K); 
Potential^, K) = -, Sel(n, K) A—> Desel(n, K); 
Absent(n, K) = -. Sel(n, K) A Deselfn, K); 
Contradicting(n, K) = Sel(n, K) A Desel(n, K); 
if Present(n, K) or Potential(n, K) { 
// compute P(n, rii, K) for rii G F(n) 
foreach m child of n do { 
Ki = C(n;)n K; 
if Present(n;, Ki) { 
P(n, m, K) = P(n, K); 
foreach rij G F(iii) do 
P(n, rij, K) = 

P(m, rij, Ki) * P (n, K) / P(m, Ki); } 
else if Contradicting(n; , Ki) { 
P(n, rii, K) = 0; 
foreach rij G F(iii) do 
P(n, rij, K) = 0; } 
else if Potential^, Ki) A count-pre =fi high { 
nnlow = max {nlow-1, 0}; 
nnhigh = nhigh - 1; 
new-pot-fac = 

eliminate(S, P(rii, Ki), nlow, nhigh); 
P(n, rij, K) = pre-fac * P(rii, Ki) * new-pot-fac; 
foreach rij G F(rii) do 
P(n, rij, K) = 

P(n, n<, K)*P(rij ,rij ,Ki)/P(rij, Ki); } 



else { //Absent(ni, Ki) 
P(n, rii, K) = 0; 
foreach rij £ F(nt) do 
P(n, rij, K) = 0; } 
} // foreach n% child of n 

if n is the root node 
foreach ni £ F(n) do 

P(n, m) = P(n, m) + P(n, n U K); 

} // if Present of Potential 
else { 

foreach n; child of n do { 
P(n, m, K) = 0; 
foreach rij £ F(rii) do 
P(n, rij, K) = 0; } 
}; //foreach subset of C(n) 

Main program 

spl(root); 
homogenity = 1; 
if P(root) / { 

foreach node in the diagram do { 

commonality (node) = P(root, node) / P(root); 
if P(root, node) = 1 

homogenity = homogeneity - l/P(root); 
} 

} 

else There are no products in the spl 

4.2. An example 

Let us consider again the diagram in figure[5] We enumer- 
ate the constraints E => H, G =+ H and J => /, as 1, 2 and 
3, respectively. We will use a bottom-up approach in order 
to show that K iterating over all subsets of the restricted 
constraints is rather manageable. 

Leaf nodes values are trivial; they yield one product 
except when they are explicitly negated. B has E, F and 
G as children. Thus, F(B)={E, F, G}. The constraints that 
apply to B are 1 and 3. Hence, we will need to compute 
P(B, 0), P(B, {1}),P(B, {3}) and P(B, {1, 3}). 

D is simply true. We can use the simplified version of 
the cardinality function, therefore 

P{B, $) = 2 3 - 1 = 7 

The first constraint, E => H, is equivalent to -J3 V H and, 

if we negate it we get E A -<H which will be our 

In order to compute P(B, D^), we have to compute the 

number of products of the child nodes, as before. Since E, 

F and G are leaf nodes, they still yield one product unless 

negated. What has changed is that now E is a present node. 

Therefore, 

P(B,{1}) = 

= P(E, {1}) ■ (P(F, {1}) + 1) ■ (P(G, {1}) + 1) = 
= 1- (1 + 1) -(1 + 1) =4 



Symmetrically, P(B, {3}) = 4. Next we compute 
P(B, {1,3}). This time, both E and G are present nodes, 
thus: 

P(B, {1,3}) = 

= (P(E, {1, 3}))(P(P, {1, 3}) + 1) ■ P(G, {1, 3}) = 2 

For C the situation is similar, but this time the three textual 
constraints are applicable (although #1 and #3 have the same 
effect on C). 
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= 2 - 
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P(C,{2,3}) 


= lxl- 


1 


= 






P(C,{1,2,3}) 


= lxl- 


1 


= 







For node D, the only cross-tree constraint applicable is 2. 

P(D,$) = 2x2x2-1 = 8-1 = 7 
P{D, {2}) = 1x2x2 = 4 

Now that we have recollected all the necessary data about B, 
C and D, we finish the calculation of the number of products. 
Lets call Z the conjunction of all the cross-tree constraints. 
Then, 

Z = (-iG V H) A (-iJ V J) A (-iE V H) 

P(A,Z) = P(A,V>)-P(A,{1})-P(A,{2}) 

- P(A,{3})+P(A,{1,2})+P(A,{1,3}) 
+ P(A,{2,3})-P(A,{1,2,3}) 

Under Z? , B, C and D are potential nodes. Any constraint 
involving 1 or 3 makes B present, any constraint involving 
2 makes D present and any constraints involving 1 and 2, 
or 3 and 2 make C absent, so we may proceed. 



PL4,0) 


= (7+1) 


•(3 + 1) 


■(7H 


-1) 


P(A,{1}) 


= 4-(l + 


l)-(7 + 


1) = 


64 


P(A,{2}) 


= (7+1) 


■(1 + 1) 


■ 4 - 


1 


P(A,{3}) 


= 4.(1 + 


l)-(7 4 


1) = 
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P(A, {1,2}) 


= 4-1-4 


= 16 






P(A,{1,3}) 


= 2-(l + 


l)-(74 


1) = 


32 


P(A,{2,3}) 


= 4-1-4 


= 16 






PL4,{1,2,3}) 


= 2-1-4 


= 8 







Therefore, 



P(A, Z) = 255 - 64 - 64 - 64 + 16 + 32 + 16 - 8 = 119 

Now it is time to compute how many products does a 
certain feature appear in. We shall calculate P(B, E, Z). 
If we start with D , all the nodes except the root are 
potential, so P{B,%) = 7. P{A,B,®) = P{B,%) ■ 
card[O..2](0)({C*,L>}) = 7-4-8 = 224. Then, P(A,E,%) = 
P(B,£;,0)card[O..2](0)({C*,L>}) = 4-32 = 128. Un- 
der D^, B and G become present, so P(A,E,{1}) = 
P(A, B, {!}) = P(A, {1}) = 64. Due to space limitations, 
we summarize the final result: 

P(A,E,Z) = P{A,E,%)-P{A,E,{1}) 

- P(A,E,{2})-P(A,E,{3}) 

+ P(A, E, {1,2}) +P(A,E, {1,3}) 

+ P(A,E, {2,3}) -P(A,E, {1,2,3}) 

= 128 - 64 - 32 - 16 + 16 + 32 + 8 -8 

= 48 



Hence, the commonality for E would be 48/119 = 0.403, 
which means that this feature appears in roughly 40% of the 
products of this SPL. Finally, to calculate SPL homogeneity 
(see equation [2]i we should identify the unique features, i.e., 
those with a commonality value of 

5. Computational Complexity and Related 
work 

The restriction of the constraints relevant for a node can 
be computed in linear time taking the union of those of the 
children plus the constraints involving the node itself. The 
cardinalities for a node can be computed in quadratic time, 
as seen in section [5] Information needed for commonality 
is computed inside the exponential loop of all the subsets 
for a particular K. This takes time proportional to n 3 2ll c (™)ll 
for each node. As every node undergoes this treatment, the 
time complexity for the whole SPL is 0(n 4 2 m ) where n is 
the number of nodes and m is the number of conjunctions 
in the conjunctive form of the textual constraints. It is a 
heavy computation. Even so, this algorithm achieves an un- 
deniable improvement over previously proposed ones which 
computed less properties than ours and required running 
times exponential to the sum of both the number of nodes 
and the number of constraints (limited to the requires or 
excludes flavors). 

To the best of our knowledge, available commercial tools 
for SPL developing, such as Gears J3] and pure::variants 
ifTSl . neither implement homogeneity nor commonality. In 
addition, textual constraints are not considered in the calcu- 
lation of the number of products. 

On the other hand, there are academic proposals for 
calculating commonality, homogeneity and the total number 
of products from a FD with textual constraints. Considering 
that a FD is composed of a graphical part (g = d — (f>) and 
a logical part ((f)), most research works on the automated 
analysis of FDs follow one of these strategies: 

1) Translating the graphical part into logic formulas (i.e., 
Tg^^i) and using off-the-self tools to process the 
formulas and the textual constraints ((f) A (f)'). For 
instance: 

• D. Batory H] proposes a translation of FDs into 
propositional logic. Resulted formulas are pro- 
cessed by off-the-shelf Logic-Truth Maintenance 
Systems (LTMS) and Boolean Satisfiability (SAT) 
solvers. 

• D. Benavides [2] devises an abstract conversion of 
FDs into Constraint Satisfaction Problems (CSP). 
FaMa Tool Suite ll2D adapts this abstract con- 
version to general CSP solvers, SAT solvers and 
Binary Decision Diagrams (BDD) solvers. 

Unfortunately, as P. van den Broek et al. [22J have 
pointed, the computation of (f>A(f)' to calculate the total 



number of products has exponential time-complexity 
on the size of the full FD, i.e., it belongs to the 
complexity class 0(2 9+< ^). 
2) Embedding the textual constraints into the graphical 
part (i.e., 7^ g /) and computing the number of prod- 
ucts from the extended graphical part. For example, 
P. van den Broek et al. [22 1 devise an algorithm for 
Xf>^g' where the elimination of a constraint can double 
the size of g. The algorithm time-complexity is linear 
on the size of g and exponential on the number of 
textual constraints, i.e., belongs to 0(g2^). Compared 
to our work, P. van den Broek's proposal has the 
following limitations: 

• Only supports card s [s..s], card s [0..s], card s [l..s] 
and card s 

• Textual constraints are limited to 'A requires B" 
(i.e., A B) and "A excludes B" (i.e., A =>■ ->B). 

Table [5] presents a comparative between our proposal and 
related work summarized in this section. 
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Table 2. Comparative between our proposal and 
related work 



6. Conclusions and Future Work 

Existing economic models support the estimation of the 
costs and benefits of developing and evolving a SPL as 
compared to undertaking traditional software development 
approaches. In addition, FDs are a popular and valuable 
tool to scope the domain of a SPL. In this paper, we have 
proposed an algorithm to infer, from a FD, the following 
parameters and metrics fundamental for economic models: 
the total number of products of the SPL, the SPL homogene- 
ity and the commonality of the SPL requirements. Instead 
of defining our algorithm for a specific FD notation, we 
have used an abstract notation named NFT, that works as a 
pivot language for most of the available notations for feature 
modeling. NFT is formally defined in this paper and it should 
be considered as a valuable reference notation for specifying 



FD analysis algorithms which take advantage of the tree or- 
ganization of FDs. Compared to related work, our algorithm 
has a general application scope, a competitive computational 
time-complexity, and it is free of dependencies on off-the- 
self logic tools, such as LTMS and SAT solvers. 

In the future, we plan to devise a prototype showing some 
improvements over the algorithm proposed in this paper. One 
useful and relatively simple extension would be suggesting 
changes to the user whenever a FD is unsatisfiable (i.e., 
the SPL total number of products = 0). It seems viable to 
propose minimal sets of textual constraints to be eliminated 
in order to achieve actual products out of the SPL. On 
the other hand, our efforts do not aim at improving the 
complexity of the algorithm, but rather its performance. 
For instance, in figure [6] the descendants of node A can 
be divided into two forests whose textual constraints do 
not cross. In future work, we will try to process each of 
these unconnected forests separately, ignoring the textual 
constraints not involved in a forest and thus saving costly 
exponential computation. 



M A ) =card 4 [2..3] 



M E ) =cardi[0..1] 






= (F =*-iG) A (J => K) 

Figure 6. A 2-component diagram 
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